智能论文笔记

Atrial Fibrillation Recurrence Risk Prediction from 12-lead ECG Recorded Pre- and Post-Ablation Procedure

Eran Zvuloni , Sheina Gendelman , Sanghamitra Mohanty , Jason Lewen , Andrea Natale , Joachim A. Behar

分类：机器学习

2022-08-22

简介：在房颤（AF）导管消融过程（CAP）期间记录了12条铅心电图（ECG）。如果没有长时间的随访评估AF复发（AFR），确定CAP是否成功并不容易。因此，AFR风险预测算法可以使CAP患者更好地管理。在这项研究中，我们从CAP前后记录的12铅ECG中提取功能，并训练AFR风险预测机学习模型。方法：从112例患者中提取前和后段段。该分析包括信号质量标准，心率变异性和由12铅ECG设计的形态生物标志物（总体804个功能）。在112名患者中，有43例AFR临床终点可用。这些用于使用前或后CAP特征来评估AFR风险预测的可行性。在嵌套的交叉验证框架内训练了一个随机的森林分类器。结果：发现36个特征在区分手术前和手术后具有统计学意义（n = 112）。对于分类，报告了接收器操作特性（AUROC）曲线下的区域，AUROC_PRE = 0.64，AUROC_POST = 0.74（n = 43）。讨论和结论：此初步分析表明AFR风险预测的可行性。这样的模型可用于改善盖帽管理。

translated by 谷歌翻译

Accu-Help: A Machine Learning based Smart Healthcare Framework for Accurate Detection of Obsessive Compulsive Disorder

Kabita Patel , Ajaya Kumar Tripathy , Laxmi Narayan Padhy , Sujita Kumar Kar , Susanta Kumar Padhy , Saraju Prasad Mohanty

分类：机器学习

2022-12-05

In recent years the importance of Smart Healthcare cannot be overstated. The current work proposed to expand the state-of-art of smart healthcare in integrating solutions for Obsessive Compulsive Disorder (OCD). Identification of OCD from oxidative stress biomarkers (OSBs) using machine learning is an important development in the study of OCD. However, this process involves the collection of OCD class labels from hospitals, collection of corresponding OSBs from biochemical laboratories, integrated and labeled dataset creation, use of suitable machine learning algorithm for designing OCD prediction model, and making these prediction models available for different biochemical laboratories for OCD prediction for unlabeled OSBs. Further, from time to time, with significant growth in the volume of the dataset with labeled samples, redesigning the prediction model is required for further use. The whole process requires distributed data collection, data integration, coordination between the hospital and biochemical laboratory, dynamic machine learning OCD prediction mode design using a suitable machine learning algorithm, and making the machine learning model available for the biochemical laboratories. Keeping all these things in mind, Accu-Help a fully automated, smart, and accurate OCD detection conceptual model is proposed to help the biochemical laboratories for efficient detection of OCD from OSBs. OSBs are classified into three classes: Healthy Individual (HI), OCD Affected Individual (OAI), and Genetically Affected Individual (GAI). The main component of this proposed framework is the machine learning OCD prediction model design. In this Accu-Help, a neural network-based approach is presented with an OCD prediction accuracy of 86 percent.

translated by 谷歌翻译

DistGNN-MB: Distributed Large-Scale Graph Neural Network Training on x86 via Minibatch Sampling

Md Vasimuddin , Ramanarayan Mohanty , Sanchit Misra , Sasikanth Avancha

分类：机器学习

2022-11-11

Training Graph Neural Networks, on graphs containing billions of vertices and edges, at scale using minibatch sampling poses a key challenge: strong-scaling graphs and training examples results in lower compute and higher communication volume and potential performance loss. DistGNN-MB employs a novel Historical Embedding Cache combined with compute-communication overlap to address this challenge. On a 32-node (64-socket) cluster of $3^{rd}$ generation Intel Xeon Scalable Processors with 36 cores per socket, DistGNN-MB trains 3-layer GraphSAGE and GAT models on OGBN-Papers100M to convergence with epoch times of 2 seconds and 4.9 seconds, respectively, on 32 compute nodes. At this scale, DistGNN-MB trains GraphSAGE 5.2x faster than the widely-used DistDGL. DistGNN-MB trains GraphSAGE and GAT 10x and 17.2x faster, respectively, as compute nodes scale from 2 to 32.

translated by 谷歌翻译

Can Querying for Bias Leak Protected Attributes? Achieving Privacy With Smooth Sensitivity

Faisal Hamman , Jiahao Chen , Sanghamitra Dutta

分类：人工智能 | 机器学习

2022-11-03

Existing regulations prohibit model developers from accessing protected attributes (gender, race, etc.), often resulting in fairness assessments on populations without knowing their protected groups. In such scenarios, institutions often adopt a separation between the model developers (who train models with no access to the protected attributes) and a compliance team (who may have access to the entire dataset for auditing purpose). However, the model developers might be allowed to test their models for bias by querying the compliance team for group fairness metrics. In this paper, we first demonstrate that simply querying for fairness metrics, such as statistical parity and equalized odds can leak the protected attributes of individuals to the model developers. We demonstrate that there always exist strategies by which the model developers can identify the protected attribute of a targeted individual in the test dataset from just a single query. In particular, we show that one can reconstruct the protected attributes of all the individuals from O(Nk log n/Nk) queries when Nk<<n using techniques from compressed sensing (n: size of the test dataset, Nk: size of smallest group). Our results pose an interesting debate in algorithmic fairness: should querying for fairness metrics be viewed as a neutral-valued solution to ensure compliance with regulations? Or, does it constitute a violation of regulations and privacy if the number of queries answered is enough for the model developers to identify the protected attributes of specific individuals? To address this supposed violation, we also propose Attribute-Conceal, a novel technique that achieves differential privacy by calibrating noise to the smooth sensitivity of our bias query, outperforming naive techniques such as Laplace mechanism. We also include experimental results on the Adult dataset and synthetic data (broad range of parameters).

translated by 谷歌翻译

Integrating connection search in graph queries

Angelos Christos Anadiotis , Ioana Manolescu , Madhulika Mohanty

分类：人工智能

2022-08-09

图数据管理和查询具有许多实际应用。当图形非常异构和/或用户不熟悉其结构时，即使用户无法描述连接，他们也可能需要找到如何在图中连接两个或多个节点的组。这仅由现有查询语言部分支持，这些语言允许搜索路径，但不适合连接三个或更多节点组的树。后者与NP-HARD组Steiner树问题有关，以前已考虑用于数据库中的关键字搜索。在这项工作中，我们正式展示了如何在诸如SPARQL或Cypher之类的图形语言中集成连接的树模式（CTPS，简称CTP），从而导致扩展查询语言（或简而言之）。然后，我们研究一组评估CTP的算法；我们概括了先前的关键字搜索工作，最重要的是（i）考虑双向边缘遍历遍历和（ii）允许用户选择任何分数功能来排名CTP结果。为了应对非常大的搜索空间，我们提出了一种有效的修剪技术，并正式建立了大量的情况，即使我们的算法molesp也可以完成修剪。我们的实验验证了我们在大量合成和现实世界中的CTP和EQL评估算法的性能。

translated by 谷歌翻译

Robust Counterfactual Explanations for Tree-Based Ensembles

Sanghamitra Dutta , Jason Long , Saumitra Mishra , Cecilia Tilli , Daniele Magazzeni

分类：机器学习 | 人工智能

2022-07-06

反事实解释为从机器学习模型中获得预期结果的方法提供了信息。但是，这种解释对基础模型的某些现实世界变化（例如，重新训练模型，更改的超参数等）并不强大，质疑其在多种应用程序中的可靠性，例如信用贷款。在这项工作中，我们提出了一种新颖的策略 - 我们称之为Robx，以生成基于树的合奏，例如XGBoost的强大反事实。基于树的合奏在强大的反事实生成中提出了其他挑战，例如，它们具有非平滑和非差异的目标函数，并且在非常相似的数据上，它们可以在RETOR下的参数空间中进行很多更改。我们首先引入了一种新颖的指标（我们称之为反事实稳定性），该指标试图量化反事实的鲁棒性将是为了模拟重新训练下的变化，并具有理想的理论属性。我们提出的策略ROBX使用任何反事实生成方法（基本方法），并通过使用我们的度量反事实稳定性迭代地完善基本方法生成的反事实来搜索强大的反事实。我们将ROBX的性能与基于基准数据集的流行反事实生成方法（对于基于树的合奏）进行了比较。结果表明，我们的策略会产生反事实，这些反事实是强大的（实际模型更改后的有效性近100％），并且在现有最新方法上也是现实的（就局部异常因素而言）。

translated by 谷歌翻译

Fairness via In-Processing in the Over-parameterized Regime: A Cautionary Tale

Akshaj Kumar Veldanda , Ivan Brugere , Jiahao Chen , Sanghamitra Dutta , Alan Mishler , Siddharth Garg

分类：机器学习

2022-06-29

DNN的成功是由过度参数化网络概括的违反直觉能力驱动的，即使它们完全适合培训数据。实际上，测试误差通常会随着过度参数化的增加而继续减少，称为双重下降。这使从业者可以实例化大型模型，而不必担心过度合适。但是，尽管有好处，但先前的工作表明，过度参数会加剧偏见对少数族裔亚组。已经提出了几种公平约束的DNN培训方法来解决这一问题。在这里，我们对Mindiff进行了严格的研究，这是Tensorflow负责AI工具包中实施的公平约束培训程序，旨在实现机会平等。我们表明，尽管Mindiff改善了参数化不足的模型的公平性，但在过度参数化的制度中可能是无效的。这是因为一个具有零训练损失的过度合适模型在培训数据上是微不足道的，造成了“公平幻想”，因此可以关闭Mindiff的优化（这将适用于任何基于差异的措施，这些措施关心错误或准确性。它不适用于人口统计）。在指定的公平限制内，与参数过度的同行相比，参数化的Mindiff模型甚至可能具有较低的错误（尽管基线过度参数化模型的错误较低）。我们进一步表明，Mindiff优化对在参数不足的制度中的批处理大小非常敏感。因此，使用Mindiff的公平模型培训需要耗时的超参数搜索。最后，我们建议使用先前提出的正则化技术，即。 L2，与Mindiff结合使用的早期停止和洪水训练公平的参数化模型。

translated by 谷歌翻译

Quantifying Feature Contributions to Overall Disparity Using Information Theory

Sanghamitra Dutta , Praveen Venkatesh , Pulkit Grover

分类：机器学习 | 人工智能 | (统计)机器学习

2022-06-16

当机器学习算法做出有偏见的决定时，了解差异来源以解释为什么存在偏见会很有帮助。在此方面，我们研究了量化每个单独特征对观察到的差异的贡献的问题。如果我们可以访问决策模型，则一种潜在的方法（从解释性文献中的基于干预的方法启发）是改变每个单独的功能（同时保持其他功能），并使用结果变化的差异来量化其贡献。但是，我们可能无法访问该模型，也无法测试/审核其输出以单独变化的功能。此外，该决定可能并不总是是输入特征（例如，在循环中）的确定性函数。对于这些情况，我们可能需要使用纯粹的分布（即观察性）技术来解释贡献，而不是介入。我们提出一个问题：当确切的决策机制无法访问时，每个单独特征对在决策中观察到的差异的“潜在”贡献是什么？我们首先提供规范的示例（思想实验），以说明解释贡献的分布和介入方法之间的差异，以及何时更适合。当无法干预输入时，我们通过利用一种称为部分信息分解的信息理论中的作品来量化有关最终决策和单个特征中存在的受保护属性的“冗余”统计依赖性。我们还进行了一个简单的案例研究，以显示如何应用该技术来量化贡献。

translated by 谷歌翻译

FRIDA -- Generative Feature Replay for Incremental Domain Adaptation

Sayan Rakshit , Anwesh Mohanty , Ruchika Chavhan , Biplab Banerjee , Gemma Roig , Subhasis Chaudhuri

分类：计算机视觉 | 机器学习

2021-12-28

我们在本文中解决了增量无监督域适应（IDA）的新问题。我们假设标记的源域和不同的未标记的目标域通过约束逐步观察到与当前域的数据仅一次可用。目标是为当前域概括为所有过去域的准确性。 IDA设置因域之间的突然差异以及包括源域内的过去数据的不可用。受到生成功能重放的概念的启发，我们提出了一种名为特征重放的增量域适应（Frida）的新颖框架，它利用了一个名为域 - 通用辅助分类GaN（DGAC-GaN）的新的增量生成对抗性网络（GAN）来生产域明确的特征表示无缝。对于域对齐，我们提出了一种简单的扩展名为Dann-Ib的流行域对抗神经网络（Dann），鼓励歧视域 - 不变和任务相关的特征学习。 Office-Home，Office-Caltech和Domainnet数据集的实验结果证实，FIDA维护了卓越的稳定性可塑性权衡，而不是文献。

translated by 谷歌翻译

Emotions are Subtle: Learning Sentiment Based Text Representations Using Contrastive Learning

Ipsita Mohanty , Ankit Goyal , Alex Dotterweich

分类：自然语言处理

2021-12-02

对比的学习技术已广泛用于计算机视野中作为增强数据集的手段。在本文中，我们将这些对比学习嵌入的使用扩展到情绪分析任务，并证明了对这些嵌入的微调在基于BERT的嵌入物上的微调方面提供了改进，以在评估时实现更高的基准。在Dynasent DataSet上。我们还探讨了我们的微调模型在跨域基准数据集上执行的。此外，我们探索了ups采样技术，以实现更平衡的班级分发，以进一步改进我们的基准任务。

translated by 谷歌翻译